On Subgraph Isomorphism 



Sergey Gubin * 



OO ■ 

o ■ 
o : 

s ■ 

<: 

: 



CO 



(N 
> 
(N 

(N 

(N 
O 
OO 
O 



X 



Abstract — Article explicitly expresses Subgraph 
Isomorphism by a polynomial size asymmetric linear 
system. 

Keywords: Subgraph Isomorphism, Linear modeling, 
Algorithm, Computational Complexity, NP- complete 

Introduction 

In 1988, Yannakakis proved p] that the Traveling Sales- 
man Problem's (TSP) polytope cannot be expressed by 
a polynomial size symmetric linear program, where sym- 
metry means that the polytope is an invariant under node 
relabeling. Because TSP is a NP-complete problem [2], 
the theorem holds for all NP-complete problems. The 
question about the size of asymmetric linear models was 
left open in [T] and it has remained open since. 

This article answers that question. We present an 
explicit polynomial size asymmetric linear model for 
Subgraph Isomorphism (SubGI). Since SubGI is a NP- 
complete problem [3] , this result is complimentary to the 
Yannakakis theorem. 

The polynomial size asymmetric linear system is built 
based on an arbitrary but fixed labeling of graphs in- 
volved - hence the system's asymmetry. The polynomial 
size for the system is achieved by immersing the problem 
in a space of higher dimension, where variables present 
relabeling possibilities for vertex couples. 

We illustrate our method with several examples. Partic- 
ularly, we explicitly present polynomial size asymmetric 
linear programs for TSP and for the Satisfiability Prob- 
lem for conjunctive normal forms (SAT). 



rithm [1] is the best known method to solve the problem. 
Yet it and other known general methods are inefficient. 
Up to date, the efficient methods were known only for 
particular types of graph couples (G, S) [3 [6l [Jj and oth- 
ers]. 

This article describes a reduction of SubGI to a system 
of linear equations and inequalities. The reduction's com- 
putational complexity and the resulting system's size are 
polynomial over the size of SubGI. For a given couple of 
graphs (G, S), the resulting system has solutions iff input 
G contains a subgraph isomorphic to pattern S. 

As well as for graphs, our reduction works for (multi) 
digraphs with (multi) loops. We will present the reduc- 
tion for the multi digraph version of SubGI which is, in 
many cases, more practical. So, input G and pattern S 
are (multi) digraphs with (multi) loops everywhere below 
in this article. 

Because our system contains a polynomial number of 
linear equations and inequalities with a polynomial num- 
ber of unknowns, it can be solved in polynomial time by, 
for example, the Khachiyan ellipsoid algorithm [HUH]- 



2 Base polytope 

Let n be a natural number. Let the following variables 
be unknowns: 



i,j,V,v= 1,2, 



1,2, 



,,n 
,,n 



i¥= 3 V>¥=v 



1 Subgraph Isomorphism 

Let G be a given graph - we will call it an input. Let 
S be another given graph - we will call it a pattern. The 
problem is whether G contains a subgraph which is iso- 
morphic to S. For any given couple of graphs (G, S), this 
decision problem is a SubGI instance. Its size can be es- 
timated by the number of vertices in graph G. 

Any graph may be seen as a relation. So, SubGI may 
be seen as a finite version of the following general prob- 
lem: whether a given relation posseses a given property. 
That explains the theoretical and practical importance of 
SubGI. 

SubGI is a NP-complete problem [3] . The Ullmann algo- 
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In the case of n = 1, variables Xij^ v are missing indeed 
Let's consider the following linear system: 



where 



x 



> 



i> = 1,2, 



¥=3, V + v 



En 

- where i,j,v= 1,2,..., n, 



i ^ 3 



En 
1=1, ijkj x ijiiv — Vjjvu, 

- where j, /i, v = 1, 2, . . . , n, [i ^ v 



(1) 



En 
v=l VOW" 

- where j = 



= b 
1,2. 



yjjw > o 

n 



The system can be described with the following box ma- 
trix of size n x n: 



B = 



( ^1,1 -^1,2 

-^2,1 y%,i 



Xl.n \ 

X2,n 



V Xn + X ^ 2 ■ ■ ■ Yn > n I nxn 

The i-th diagonal box in box matrix B is the following 
diagonal matrix: 

Yu = diag(j/i ! i i i ! i, 2/^,2,2, Vi,i,u,v, yi,i,n,n) 

The (i, j)-th off-diagonal box in box matrix B is the fol- 
lowing matrix: 



X, 



( £i,j,i,2 • ■ ■ a;»j,i,n \ 



o / 



System[TJreflects the following relations between elements 
of box matrix B: 

(1) B is a symmetric matrix: Xy = X^; 

(2) The total of each column in matrix Xy does not de- 
pend on i but only on box column j and on the column 
in this box column. The total is the appropriate element 
in diagonal matrix Yjj ; 



(3) The total over i of elements 



'IJI1V 



does not depend 



on fi but only on box column j and on the column v in 
this box column. This total is the appropriate element in 
matrix Yjj - element yjj VV \ 

(4) The (j, ^)-th columns in off-diagonal boxes Xy of 
box matrix B constitute a doubly stochastic matrix mul- 
tiplied by element yjjw 

(5) The total of all elements in matrix Yjj is equal 1; 

(6) Due to the matrix's symmetry, all of the above is true 
in the horizontal direction, too; 

System [TJ always has solutions. The following solution 
is minimal in the sense of Euclidean norm - we call it a 
center: 

1 1 



' i 



Vjjvu 



n(n — 1) n 

Obviously, the set of all solutions of system Q] is a convex 
set. Also, because system Q] is a linear system, the set is 
a polytope. We call this polytope a base polytope. 

The following solution of system[T]is a vertex of the base 
polytope: there is one and only one non-zero element in 
each box Yu and Xy. Obviously, all non-zero elements 
in the boxes are equal 1 and they are arranged in a grid 
of elements in matrix B, one element per box. We call 
any such solution of system [T] a solution grid. 

The following lemma shows that all vertices of the base 
polytope are solution grids. 



Proof System [TJ consists of linear equations and the fol- 
lowing inequalities: 

•^■ij^iy ^ 0, yjjw ^ 

- where all indexes are in their appropriate ranges. 
The linear equations have solutions - the center, for 
example. The solutions constitute a linear subspace 
in the linear space of all n 2 x n 2 matrices with real 
elements. Thus, vertices of base polytope are those 
points in the linear subspace where the number of 
variables which equal is maximal possible. It so 
happens that these points are the solution grids, 
QED. 

3 Compatibility matrix 

Let digraphs G and S be the given SubGI instance 
(G,S). Let Vc and Vs be vertex sets of the input and 
pattern appropriately. Obviously, 

|Vfe| > \Vs\ 

- the instance would have resolution "NO" otherwise. 
Now, let's add | Vol — \Vs\ isolated vertices to pattern 

S. Let's preserve notion S for the resulting pattern. Let 
n be the number of vertices in the input and the pattern 
after the addition of isolated vertices: 

n=|Vb| = \V S \ 

Obviously, the SubGI instance (G, S) emerging after the 
addition of isolated vertices has the same resolution as 
the original instance indeed. 

Let's arbitrarily label/enumerate vertices in input G 
and pattern S. Let Aq and As be the adjacency matri- 
ces of the input and pattern appropriate to the labeling. 
Obviously, SubGI instance (G, S) has resolution "YES" 
iff there exists such a relabeling of pattern S that all ele- 
ments of matrix A$ emerging after that relabeling will be 
less than or equal to the appropriate elements of matrix 
Aq. In other words, SubGI instance (G,S) has resolu- 
tion "YES" iff the following integral quadratic system has 
solution^]]: 

A G > XA S X T (2) 

- where X is the unknown permutation matrix of size 
nxn. Permutation matrix X presents the unknown ver- 
tex relabeling of pattern S after which the existence of 
an input's subgraph isomorphic to S has to become self- 
evident. Obviously, such a relabeling of S exists iff G has 
at least one subgraph isomorphic to S. 

To solve system[2j let's build the following matrix which 
we call a compatibility matrix. 



For two matrices A = (ajj) and B = (bij) of the same size, 
relation A> B means that 



Lemma 1 Any solution of system Q] is a convex combi- 
nation of solution grids. 



Vi, j (ojj > b^). 



Let the input and pattern's adjacency matrices be as 
follows: 

-4-G = \9nv)nxni As = (sij)nxn 

For each couple of pattern's vertices, let's build a compat- 
ibility box. The compatibility box for vertices with labels 
i and j is the following matrix Cy = (ey M „)„ xn : 



1; $ij — 9 [tv ^ Sji 9u^i 

0, Sij ^> Q[iv V Sji ^> (/zyy^ 



(3) 



Compatibility box Cy shows all possible re-enumerations 
for the pattern's vertices i and j with disregard to the rest 
of the pattern's vertices. Obviously, compatibility boxes 
Ca are diagonal matrices. And all diagonal elements in 
compatibility boxes Gy, i ^ j, are equal 0. 

The compatibility matrix for SubGI instance (G, S) is 
the following box matrix: 

C = (Cij)nxn 

The compatibility matrix aggregates all compatibility 
boxes in accordance with their indexes. 

Obviously, integral quadratic system [5] has a solution iff 
in the compatibility matrix there is a grid of elements, 
one element per compatibility box, in which all elements 
are equal 1: 



7 = {e 



1 | fj, = v = v{j)} 



- where 7 is the grid. Any such grid of elements in com- 
patibility matrix G we call a solution (/riejl, too. 



Lemma 2 SubGI instance (G, S) has resolution "YES" 
iff compatibility matrix G contains a solution grid. 

Proof Any solution grid defines a vertex relabeling of S 
which satisfies system [21 QED. 

4 Linear model for SubGI 

The similarities between compatibility matrix G and 
the base polytope B are obvious. Due to lemmas 1 and 
2, we can decide about the existence/absence of solution 
grids in matrix G searching matrix B for solution grids 
subject to the following constrains: 



= 



(4) 



- where indexes are the indexes of all those elements of 
compatibility matrix G which are equal 0, 

Then, lemmas 1 and 2 imply the following polynomial 
size asymmetric linear model for SubGI. 



2 In the next section, we will show that the solution grids from 
this section and the solution grids from the previous section are the 



Theorem SubGI instance (G, S) has resolution "YES" 
iff the aggregated system Q] and [4] has a solution. 

Proof Any solution of the aggregated system is a convex 
hull of solution grids. There is a solution grid iff the 
resolution for instance (G, S) is "YES", QED. 

System [TJ U consists of 0(n 4 ) linear equations and in- 
equalities with 0(n ) unknowns. The existence/absence 
of the system's solutions can be detected using the el- 
lipsoid algorithm [H [9]. Because all coefficients of the 
system are or I, the ellipsoid algorithm will solve this 
system in strongly polynomial time. 

Let's notice that constrains[4]explicitly involve the input 
and pattern's vertex labeling trough their adjacency ma- 
trices - see definition [3] of the compatibility boxes. Thus, 
system [TJ U is an asymmetric linear system. It can be 
seen that the system's solutions constitute a convex sub- 
set of the Birkhoff polytope [TU] in R n . Vertex relabeling 
of digraphs G and S will rotate that subset all over the 
polytope. 

5 Examples 

Let's use our method and resolve the following SubGI 
instances. 



Vertex vs vertex: Let input and pattern have just one 
vertex each: 

Ac = (314)1x1, -4s = (si.i)ixi 
System [I] for n = 1 looks as follows: 

2/1,1,14 = 1 

Constrains S] for the instance look as follows: 



2/1444 



1, < gi,i 

0, si,i > gi tl 



Thus, the resolution for this SubGI instance is 
"YES" iff there is no excess of loops in the pattern: 

S14 < 314 

Arc vs arc: Let input and pattern be just arcs: 
G : 1 -> 2, S : 1 <- 2 
For n = 2, system Q] looks as follows: 



2/1444 
2/2,244 



24,24,2 
£1,2,24 
£244,2 
£24,24 
" 2/14,2,2 
■ 2/2,2,2,2 



2/2,2,2,2 

2/2,244 

2/14,2,2 

2/1,144 
I 

f 



(5) 



Constrains 0] for the given input and pattern may be 
presented as follows: 

J/1,1,1,1 = 0, 2/2,2,2,2 = 

The aggregated system has a solution: 

^1,2,2,1 = 1/2,2,1,1 = £2,1,1,2 = J/1,1,2,2 = 1 
^1,2,1,2 = 2/2,2,2,2 = ^2,1,2,1 = 2/1,1,1,1 = 

Thus, the resolution for the given SubGI instance is 
"YES" . The appropriate relabeling of the pattern is 
transposition (1,2). 

Arc vs loop: Let input and pattern be an arc and a 
loop appropriately: 

G : 1 -> 2, S: 1 -f 1 

Adding to S one isolated vertex with index 2 will 
produce the case of n = 2. System [1] for the case is 
system [5j and constrains 2] for the instance may be 
presented as follows: 

2/i,i,i,i = 0, 2/1,1,2,2 = 
The aggregated system has no solutions: 

1 = 2/1,1,1,1 + 2/1,1,2,2 = 

Thus, the resolution for the given SubGI instance is 
"NO". 

Arc/ loop vs loop /arc: Let input and pattern be the 
following digraphs: 

G : 1 ^ 2 ^ 2, ,5:1^1^2 

System Q] for the case is system [SJ and constrains 0] 
for the instance may be presented as follows: 

2/1,1,1,1 = 0, £1,2,2,1 = 
The aggregated system has no solutions: 

1 = 2/1,1.2,2 = £1,2,2,1 = 

Thus, the resolution for the given SubGI instance is 
"NO". 

Edge vs arc: Let input and pattern be the following di- 
graphs: 

G : 1 -> 2 — ► 1, S: 1 -> 2 

System Q] for the case is system [3 and there are no 
constrains S] for the instance. Thus, the aggregated 
system consists of system [5] alone. The center of its 
solutions is the following point: 

Mi, j,n, v (Xij^u = y%iw = V 2 ) 

Thus, the resolution for the given SubGI instance is 
"YES". 



Cycle vs edge: Let input and pattern be the following 
digraphs: 

G:l->2->3->l, S : 1 2 ->• 1 

Compatibility matrix C for this SubGI instance 
looks as follows: 



1 
1 
1 







1 1 

1 1 
1 1 







1 
1 
1 


1 1 

1 1 
1 1 


1 1 

1 1 
1 1 


1 1 

1 1 
1 1 


1 
1 
1 



Compatibility boxes entirely filled with will pro- 
duce constrains [4] incompatible with system [T] i.e. 
the aggregated system[T]and|4]will have no solutions. 
Thus, the resolution for the given SubGI instance is 
"NO". 

Cycle vs path: Let input and pattern be the following 
digraphs: 

G:l->2->3->l, 5:1^2^3 

Compatibility matrix C for this SubGI instance 
looks as follows: 



1 
1 
1 


1 
1 
10 


1 1 

1 1 
1 1 


1 

10 
1 


10 

1 
1 


10 

1 

1 


1 1 

1 1 
1 1 


1 
10 

1 


1 
10 
1 



Constrains 0] produced by this compatibility matrix 
are compatible with system [TJ i.e. the aggregated 
system has solutions. Two of the three solution grids 
of the system are shown in the above matrix in italic 
and in bold. Thus, the resolution for the given SubGI 
instance is "YES" . 

Cycle vs cycle: Let input and pattern be the following 
digraphs: 

G:l->2->3->4-vl, # : 1 ^ 2 ^ 3 ^> 1 
Compatibility matrix C for this SubGI instance 



looks as follows: 



1 n n n 
1 u u u 

10 

10 

1 


U 1 u u 
10 
1 
10 


n n n 1 

U U U 1 

10 
10 
10 


n i i 1 

U 1 1 1 

10 11 
110 1 
1110 


n n n i 

U U U 1 

10 
10 
10 


1 n n n 
1 u u u 

10 

10 

1 


n 1 n n 
U 1 u u 

10 

1 

10 


(1111 
U 1 1 X 

10 11 
110 1 
1110 


n 1 n n 

U 1 u u 
10 
1 
10 


n n n i 

u U U 1 

10 
10 
10 


1 n n n 
1 u u u 

10 

10 

1 


U X X X 

10 11 
110 1 
1110 


111 
10 11 
110 1 
1110 


111 
10 11 
110 1 
1110 


111 
10 11 
110 1 
1110 


10 
10 
10 
1 



Constrains [5] produced by this compatibility ma- 
trix are incompatible with system [TJ i.e. the ag- 
gregated system has no solutions. To see that, let's 
apply system Q] to the compatibility matrix as con- 
strains on its elements. To satisfy these constrains 
at least partially, the forth box column and the 
forth box row of the compatibility matrix have to 
be trimmed/depleted as follows: 



After this depletion, the fact that the fourth box 
column contradicts with the third group of equations 
in system Q] becomes obvious. Thus, the resolution 
for the given SubGI instance is "NO" . 

6 Linear program for TSP 

Let input G be an arc-weighted digraph, i.e. let each 
arc in G have a weight. TSP is a problem of finding a 
Hamiltonian cycle in G with the minimal total weighl[f|. 
That is a NP-complete problem [2]. 



10 
10 
10 
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10 
10 
1 
10 


1 
10 
10 
10 


10 
1 
10 
10 


1 
10 
10 
10 


10 
10 
10 
1 


10 
10 
1 
10 


10 
1 
10 
10 


10 
10 
1 
10 


1 
10 
10 
10 


10 
10 
10 
1 


10 
1 
10 
10 


10 
1 
10 
10 


10 
1 
10 
10 


10 
1 
10 
10 


10 
10 
10 
1 



The pattern S for TSP is any circular permutation 
matrix, for example: 



S 



( 





. 


.. 1\ 


1 





. 


. . 





1 


. 


. . 


V = 









/ nxn 



Let's construct system 2] for SubGI instance (G,S). 
Then, aggregated linear system Q] and 2] will ex- 
press the Hamiltonian Cycle Problem which is a NP- 
complete problem [SJ, as well. 

Let w(/jl, v) be a weight function - the weight of the 
arc from vertex fi into vertex v in input G. As usual, 
let w{n,u) = +oo for non-adjacent vertices. Then, 
the following asymmetric polynomial size linear pro- 
gram will express TSP: 



w(p, v)Xi 



^Because G is a digraph, we actually consider here the Asym- 
metric Traveling Salesman Problem (ATSP). 



- subject to constrains [T] and [U 

From the practical perspective, let's notice that we 
do not require function w(fj,, v) to be positive. 

7 Linear model for SAT 

In 1971, Cook [3] found that with a polynomial 
number of operations any non-deterministic Turing 
machine (NDTM) can be expressed by the appropri- 
ate conjunctive normal form (CNF): the question of 
whether there is an acceptable input is a question 
of whether the appropriate CNF is satisfiable. That 
made SAT the first NP-complete problem, because it 
is a NP-problem and the very words "NP-problem" 
mean a problem which can be solved by NDTM in 
polynomial time. In 1973, Levin [11] independently 
repeated the result in terms of search. In 1972, Karp 
[2j selected SAT as a root of NP-completeness the- 
ory: a problem is NP-complete if SAT can be reduced 
to that problem in polynomial time, and visa versa. 
Let / be a given CNF: 

/ = ci A c 2 A . . . A c m 

- where clause Ci is a disjunction of fcj literals - some 
Boolean variables or their negations. Formula / de- 
fines an instance of SAT: whether there is such a 
true-assignment to the involved Boolean variables 
which would make / = true. 

Ultimately, we could apply the distributive laws 
and rewrite formula / in a disjunctive form (DF). 
That would reduce SAT to an existence problem for 
implicants in the emerging DF. This last problem 
can be easily expressed as a SubGI instance. 

Let's enumerate literals in each of the clauses in 
formula /. For each couple of clauses (ci,Cj), let's 
build a compatibility box: the (a, /3)-element in the 



matrix is or 1 depending on whether the a-th literal 
in clause Ci and the /3-th literal in clause Cj are com- 
plimentary. Let's aggregate all these compatibility 
boxes in a box matrix. Obviously, there is an impli- 
cant in the DF of / iff there is a grid of elements in 
the box matrix, one element per compatibility box, 
whose all elements are equal 1. Each such grid of 
elements consists of the couples of literals which par- 
ticipate in an implicant. 

The box matrix built in such a way may be seen as 
input G. Then, pattern S may be a box matrix of 
the same structure as G but whose boxes are entirely 
filled with except their upper-left-corner elements, 
which are equal 1: 

/I ...\ 

q- rq.A q.. _ ... 

V ' ' / kiXkj 

There is one obvious restriction on the relabeling of 
S: the elements of boxes Sy are not allowed to leave 
their boxes. This restriction can be accommodated 
in system 0] with a polynomial number of additional 
linear constrains. 

Conclusion 

We described a polynomial time reduction of SubGI 
to a polynomial size asymmetric linear system. The 
system consists of systems [TJ and SJ Subsystem Q] 
depends on the size of SubGI instance, only. Sub- 
system |4] describes the structure of the given input 
and pattern. The system's asymmetry is due to the 
explicit involvement of the input and pattern's adja- 
cency matrices in the construction of system U - see 
definition [3J So, the result may be seen as compli- 
mentary to the Yannakakis theorem [TJ. 

Linear system [TJ [4] defines a sub-polytope in the 
Birkhoff polytope. Vertices of this sub-polytope are 
those permutation matrices which satisfy quadratic 
integral system O Relabeling of the input and pat- 
tern rotates this sub-polytope all over the Birkhoff 
polytope. 

Ultimately, system [TJ [4] may be seen as a parallel 
testing of all guesses, where guesses are n x n permu- 
tation matrices - the unknowns in system [5] Basi- 
cally, this parallelization was achieved with encoding 
SubGI in the contradictions between relabeling pos- 
sibilities for different vertices. 

Obviously, the described "continuous" solution of 
SubGI is not unique. Also, we could develop a poly- 
nomial time discrete algorithm which would search 
the compatibility matrix for the solution grids as, 
for example, it was done in 12] for 3SAT which is a 
NP-complete problem [3], toqj. 
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4 For 3SAT, see a demo at http://www.timescube.com 



